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Abstract The North Pacific exhibits patterns of low-frequency 
variability on the intra-annual to decadal time scales, which 
manifest themselves in both model data and the observa¬ 
tional record, and prediction of such low-frequency modes 
of variability is of great interest to the community. While 
parametric models, such as stationary and non-stationary au¬ 
toregressive models, possibly including external factors, may 
perform well in a data-fitting setting, they may perform poorly 
in a prediction setting. Ensemble analog forecasting, which 
relies on the historical record to provide estimates of the 
future based on past trajectories of those states similar to 
the initial state of interest, provides a promising, nonpara- 
metric approach to forecasting that makes no assumptions 
on the underlying dynamics or its statistics. We apply such 
forecasting to low-frequency modes of variability for the 
North Pacific sea surface temperature and sea ice concentra¬ 
tion fields extracted through Nonlinear Laplacian Spectral 
Analysis. We find such methods may outperform paramet¬ 
ric methods and simple persistence with increased predictive 
skill. 


1 Introduction 

Predictability in general circulation models (GCMs) for the 
North Pacific, from seasonal to decadal time scales, has been 
the subject of many recent studies, (e.g., Tietsche et al (2014), 


decadal variability, such as the Pacific Decadal Oscillation 
(PDO; Mantua et al, 1997), and the more rapidly decorrelat- 
ing North Pacific Gyre Oscillation (NPGO; Di Lorenzo et al, 
2008), both of which have been the subject of much interest. 
An important phenomenon of intra-annual variability in the 
North Pacific is the reemergence of anomalies in both the sea 
surface temperature (SST) fields (Alexander et al, 1999), as 
well as in the sea ice concentration (SIC) field (Blanchard- 
Wrigglesworth et al, 2011a), where regional anomalies in 
these state variables vanish over a season, and reappear sev¬ 
eral months later, as made evident by high time-lagged cor¬ 
relations. 

The North Pacific (along with the North Atlantic) is a re¬ 
gion of relative strong low-frequency variability in the global 
climate system (Branstator et al, 2012). Yet, in GCMs it 
has been shown that this region shows relative lack of pre¬ 
dictability (less than a decade; Collins, 2002), with the Com¬ 
munity Climate System Model (CCSM) having particularly 
weak persistence and low predictability in the North Pacific 
among similiar GCMs (Branstator et al, 2012). The ocean 
and sea ice systems show stronger low-frequency variabil¬ 
ity than the atmosphere (Newman et al, 2003). The internal 
variability exhibited in the North Pacific has also been char¬ 
acterized as being in distinct climate regimes (e.g. Overland 
et al (2006,2008)), where the dynamics exhibit regime tran¬ 
sitions and metastability. As a result, cluster based meth¬ 
ods have been a popular approach to model regime behavior 

Blanchard-Wrigglesworth and Bitz (2014), Blanchard-WriggleswiSllmate settings, such as in Franzke et al ( 2008 , 2009), 


et al (2011a)), several of which focus on the role of the 
initial state (e.g., Collins (2002), Blanchard-Wrigglesworth 
et al (201 lb), Branstator et al (2012), Day et al (2014)). The 
North Pacific exhibits prominent examples of interannual to 
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where hidden Markov models were used to model atmo¬ 
spheric flows. 

A traditional modeling approach for the PDO has been 
to fit an autoregressive process (Hasselmann, 1976; Frankig- 
noul and Hasselmann, 1977), as well as models being ex¬ 
ternally forced from the tropics through ENSO (Newman 
et al, 2003). In Giannakis and Majda (2012b), autoregres¬ 
sive models were successful in predicting temporal patterns 
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corresponding to the PDO and NPGO when forced with 
suitably modulated intermittent modes. Additional flexibil¬ 
ity can be built into regression models by allowing nonsta¬ 
tionary, i.e. time dependent coefficients, and a successful ex¬ 
ample of this is the finite element method (FEM) clustering 
procedure combined with multivariate autoregressive factor 
(VARX) model framework (Horenko, 2010a,b). In this ap¬ 
proach, the data is partitioned into a predetermined num¬ 
ber of clusters, and model regression coefficients are es¬ 
timated on each cluster, together with a cluster affiliation 
function that indicates which model is used at a particular 
time. This can further be adapted to account for external 
factors (Horenko, 2011a). While FEM-VARX methods can 
be effective at fitting the desired data, advancing the sys¬ 
tem state in the future for a prediction is dependent on be¬ 
ing able to successfully predict the unknown cluster affili¬ 
ation function. Methods for using this framework in a pre¬ 
diction setting have been used in Horenko (2011a,b). An¬ 
other regression modeling approach was put forward in Ma- 
jda and Harlim (2013), where physics constraints were im¬ 
posed on multilevel nonlinear regression models, preventing 
ad-hoc, finite-time blow-up, a pathological behavior previ¬ 
ously shown to exist in such models without physics con¬ 
straints in Majda and Yuan (2012). Appropriate nonlinear 
regression models using this strategy have been shown to 
have high skill for predicting the intermittent cloud patterns 
of tropical intra-seasonal variability (Chen et al, 2014; Chen 
and Majda, 2015b,a). 

Parametric models may perform well for fitting, but of¬ 
ten have poor performance in a prediction setting, partic¬ 
ularly in systems that exhibit distinct dynamical regimes. 
Nonparametric models can be advantageous in systems where 
the underlying dynamical system is unknown or imperfect. 
An early example of this is an analog forecast, first intro¬ 
duced by Lorenz (1969), where one considers the histori¬ 
cal record, and makes predictions based on examining state 
variable trajectories in the past that are similiar to the cur¬ 
rent state. Analog forecasting does not make any assump¬ 
tions on the underlying dynamics, and cleverly avoids model 
error when the underlying model is observations from na¬ 
ture. This has since been applied to other climate predic¬ 
tion scenarios, such as the Southern Oscillation Index (Dros- 
dowsky, 1994), the Indian summer monsoon (Xavier and 
Goswami, 2007), and wind forecasting (Alessandrini et al, 
2015), where it was found to be particularly useful in fore¬ 
casting rare events. 

Key to the success of an analog forecasting method is 
the ability to identify a good historical analog to the current 
initial state. In climate applications, the choice of analog is 
usually determined by minimizing Euclidean distance be¬ 
tween snapshots of system states, with a single analog being 
selected (Lorenz, 1969; Branstator et al, 2012). In Zhao and 
Giannakis (2014), analog forecasting was extended upon in 


two key ways. First, the state vectors considered were in 
Takens lagged embedding space (Takens, 1981), which cap¬ 
tures some of the dynamics of the system, rather than a snap¬ 
shot in time. Second, instead of selecting a single analog de¬ 
termined by Euclidean distance, weighted sums of analogs 
were considered, where weights are determined by a ker¬ 
nel function. In this context, a kernel is an exponentially de¬ 
caying pairwise similarity measure, intuitively, playing the 
role of a local covariance matrix. In Zhao and Giannakis 
(2014), kernels were introduced in the context of Nonlin¬ 
ear Spectral Analysis (NLSA; Giannakis and Majda, 2012c, 
2013, 2014) for decomposing high-dimensional spatiotem- 
poral data, leading naturally to a class of low-frequency ob¬ 
servables for prediction through kernel eigenfunctions. In 
Bushuk et al (2014), the kernel used in the NLSA algorithm 
was adapted to be multivariate, allowing for multiple vari¬ 
ables, possibly of different physical units, to be considered 
jointly in the analysis. 

The aim of this study is to develop low dimensional, 
data-driven models for prediction of dominant low-frequency 
climate variability patterns in the North Pacific, combining 
the approaches laid out in Zhao and Giannakis (2014) and 
Bushuk et al (2014). We invoke a prediction approach that 
is purely statistical, making use only of the available histori¬ 
cal record, possibly corrupted by model error, and the initial 
state itself. The approach utilizes out-of-sample extension 
methods (Coifman and Lafon, 2006b; Rabin and Coifman, 
2012 ) to define our target observable beyond a designated 
training period. There are some key differences between this 
study and that of Zhao and Giannakis (2014). First, we con¬ 
sider multivariate data, including SIC, which is intrinsically 
noisier than SST, given that SIC is a thresholded variable 
and experiences high variability in the marginal ice zone. 
We also make use of observational data, which is a rela¬ 
tively short time series compared to GCM model data. In 
addition to considering kernel eigenfunctions as our target 
observable, which we will see is well-suited for prediction, 
we also consider integrated sea ice anomalies, a more chal¬ 
lenging observable uninfluenced by the data analysis algo¬ 
rithm. We compare performance of these kernel ensemble 
analog forecasting techniques to some parametric forecast 
models (specifically, autoregressive and FEM-VARX mod¬ 
els), and find the former to have higher predictive skill than 
the latter, which often can not even outperform the simple 
persistence forecast. 

The rest of the paper is outlined as follows. In Section 2, 
we discuss the mathematical methods used to perform ker¬ 
nel ensemble analog forecasting predictions, and also dis¬ 
cuss alternative parametric forecasting methods. In Section 
3 we describe the data sets used for our experiments, and in 
Section 4 we present our results. Discussion and concluding 
remarks are in Section 5. 
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2 Methods 

Our forecasting approach is motivated by using kernels, a 
pairwise measure of similarity for the state vectors of inter¬ 
est. A simple example of such an object is a Gaussian kernel: 

w{xi,xj)=e ''0 , ( 1 ) 

where x is a state variable of interest, and ao is a scale pa¬ 
rameter that controls the locality of the kernel. The kernels 
are then used to generate a weighted ensemble of analog 
forecasts, making use of an available historical record, rather 
than relying on a single analog. To extend the observable we 
wish to predict beyond the historical period into the future, 
we make use of out-of-sample extension techniques. These 
techniques allow one to extend a function (observable) to a 
new point y by looking at values of the function at known 
points Xi close to y, where closeness will be determined by 
the kernels. An important part of any prediction problem is 
to choose a target observable that is both physically mean¬ 
ingful and exhibits high predictability. As demonstrated in 
Zhao and Giannakis (2014), defining a kernel on the desired 
space naturally leads to a preferred class of observables that 
exhibit time-scale separation and good predictability. 


2.1 Nonlinear Laplacian spectral analysis 

Nonlinear dynamical systems generally give rise to datasets 
with low-dimensional nonlinear geometric structures (e.g., 
attractors). We therefore turn to a data analysis technique, 
NTS A, that allows us to extract spatio-temporal patterns from 
data from a high-dimensional non-linear dynamical system, 
such as a coupled global climate system (Giannakis and Ma- 
jda, 2012a,c, 2013, 2014). The standard NLSA algorithm 
is a nonlinear manifold generalization of singular spectrum 
analysis (SSA) (Ghil et al, 2002), where the covariance op¬ 
erator is replaced by a discrete Laplace-Beltrami operator 
to account for non-linear geometry on the underlying data 
manifold. The eigenfunctions of this operator then form a 
convenient orthonormal basis on the data manifold. A key 
advantage of NLSA is there is no pre-processing of the data 
needed, such as band-pass filtering or seasonal partitioning. 

2.1.1 Time-lagged embedding 

An important first step in the NLSA algorithm is to perform 
a time-lagged embedding of the spatio-temporal data as a 
method of inducing time-scale separation in the extracted 
modes. An analog forecast method is driven by the initial 
data, and to incorporate some of the system’s dynamics into 
account in selecting an analog, instead of using a snapshot 
in time of the state as our initial condition, time-lagged em¬ 
bedding helps make the data more Markovian. 


Let z{ti) S be a time series sampled uniformly with 
time step 5t, on a grid of size d, with i = l,...,N samples. 
We construct a lag-embedding of the data set with a window 
of length q, and consider the lag-embedded time series 

x{ti) = 

The data now lies in R"*, with m = dq the dimension of the 
lagged embedded space, and n = N — q-\-1 number of sam¬ 
ples in lagged embedded space (also called Takens embed¬ 
ding space, or delay coordinate space). It has been shown 
that time-lagged embedding recovers the topology of the at¬ 
tractor of the underlying dynamical system that has been 
lost through partial observations (Takens, 1981; Sauer et al, 
1991). In particular, the embedding affects the non-linear ge¬ 
ometry of the underlying data manifold in such a way to 
allow for dynamically stable patterns with time scale sepa¬ 
ration (Berry et al, 2013), a desirable property that will lead 
to observables with high predictability. 


2.1.2 Discrete Laplacian 


The next step is to define a kernel on the data manifold ./#. 
Rather than using a simple Gaussian as in Equation (1), the 
NLSA kernel makes use of phase velocities = ||x(f,) — 
x(f,_i)||, which forms a vector field on the data manifold 
and provides additional important dynamic information. The 
NLSA kernel we use is 


K{x{ti),x{tj)) =exp 


\\x{ti)-x{tj)\\^ 


( 2 ) 


With this kernel K and associated matrix Kij — K {x{ti),x{tj)), 
we solve the Laplacian eigenvalue problem to acquire an 
eigenfunction basis {0,} on the data-manifold To do 
this, we construct the discrete graph Laplacian by following 
the diffusion maps approach of Coifman and Lafon (2006a), 
and forming the following matrices: 


Q‘ — H — rtaLa ’ 

j=\ '^i '^j 

Di=f^Ktj, Pij = ^, Uj=I-Pj. 

f=l 

Here a is a real parameter, typically with value 0, 5 , 1. We 
note in the large data limit, as n — 00 and e — 0 , this dis¬ 
crete Laplacian converges to the Laplace-Beltrami operator 
on for a Riemannian metric that depends on the kernel 
(Coifman and Lafon, 2006a). We can therefore think of the 
kernel as biasing the geometry of the data to reveal a class of 
features, and the NLSA kernel does this in such a way as to 
extract dynamically stable modes with time scale separation. 
We then solve the eigenvalue problem 


L^i — 


(3) 
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The resulting Laplacian eigenfunctions (pi = {(pu, • • •, <pni)'^ 
form an orthonormal basis on the data manifold ^ with 
respect to the weighted inner product; 


n 


{(pij^pj) — ^ D/^ph^iCpifj — 5ij. 
k=l 


As well as forming a convenient orthonormal basis, the 
eigenfunctions (pi give us a natural class of observables with 
good time-scale separation and high predictability. These 
eigenfunctions pi are time series, nonlinear analogs to prin¬ 
cipal components, and can be used to recreate spatio-temporal 
modes, similar to extended empirical orthogonal functions 
(EOFs) (Giannakis and Majda, 2012b,c, 2013). However, 
unlike EOFs, the eigenfunctions pi do not measure variance, 
but rather measure oscillations or roughness in the abstract 
space the underlying data manifold. The eigenvalues Xi 
measure the Dirichlet energy of the corresponding eigen¬ 
functions, which has the interpretation of being squared wave 
numbers on this manifold (Giannakis and Majda, 2014). We 
now use the leading low-frequency kernel eigenfunctions as 
our target observables for prediction. 


2.1.3 Multiple components 

As described in Bushuk et al (2014), the above NLSA al¬ 
gorithm can be modified to incorporate more than one time 
series. Let be two signals, sampled uniformly with 

time step 5t, on (possibly different) di,d 2 grid points. Af¬ 
ter lag-embedding each variable with embedding windows 
qi,q 2 to its appropriate embedding space, so z^'^ (fi) € i—>■ 

(ti) e and z(2^ (u) € ^ (f,-) e , we con¬ 

struct the kernel function K by scaling the physical variables 
to be dimensionless by 


( \\xWiti)-xWitj)f \\x^^\ti)-x(^Htj)\A 

\ ) 


2.2 Out-of-sample extension 

Now that we have established our class of target observ¬ 
ables, namely the eigenfunctions pi in Equation (3), we need 
a method for extending these observables into the future 
to form our predictions, for which we draw upon out-of- 
sample extension techniques. To be precise, let / be a func¬ 
tion defined on a set M = {xi,... ,x„},x,' € K™; / may be 
vector-valued, but in our case is scalar. We wish to make 
a prediction of / by extending the function to be defined 
on a point outside of the training set M, by performing an 
out-of-sample extension, which we call /. There are some 
desireable qualities we wish to have in such an extension, 
namely that / is in some way well-behaved and smooth on 
our space, and is in consistent as the number of in-samples 
increases. Below we discuss two such methods of out-of- 
sample extension, the geometric harmonics, based on the 
Nystrom method, and Laplacian pyramids. 


2.2.1 Geometric harmonics 


The first approach for out-of-sample extension is based on 
the Nystrom method (Nystrom, 1930), recently adapted to 
machine learning applications (Coifman and Lafon, 2006b), 
and is based on representing a function / on M in terms of an 
eigenfunction basis obtained from a spectral decomposition 
of a kernel. While we use the NLSA kernel (Equation 2), 
other kernels could be used, another natural choice being 
a Gaussian kernel. For a general kernel w : R.”” x R™ R, 
consider its row-sum normalized counterpart: 


W{yi,Xj) 


Ayi,xj) 

'Ljw{yi,Xj)' 


These kernels have the convenient interpretation of forming 
discrete probability distributions in the second argument, de¬ 
pendent on the first argument, so W (y,x) = Py{x), which will 
be a useful perspective later in our analog forecasting in Sec¬ 
tion 2.3. We then solve the eigenvalue problem 


n 


h(Pi{xi) = Y^W{xi,Xj)(pi{xj). 
f=i 


This can then be extended to any number of variables, 
regardless of physical units, and allows for analysis in cou¬ 
pled systems, such as the ocean and sea ice components of 
a climate model. An alternative approach to Equation (4) in 
extending the NLSA kernel to multiple variables with dif¬ 
ferent physical units is to first normalize each variable to 
unit variance. However a drawback of this approach is that 
information about relative variability is lost as the ratios of 
the variances are set equal, rather than letting the dynam¬ 
ics of the system control the variance ratios of the different 
variables (Bushuk et al, 2014) as in Equation (4). 


We note the spectral decomposition of W yields a set of real 
eigenvalues Xi and an orthonormal set of eigenfunctions (pi 
that form a basis for L^{M) (Coifman and Lafon, 2006b), 
and we can thus represent our function / in terms of this 
basis: 

f{xi)='^{(PjJ)(Pj{xi). (5) 

.7=1 

Let y € R'”,y ^ M be an out-of-sample, or test data, point, 
to which we wish to extend the function /. If Xi ^ 0, the 
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eigenfunction (pi can be extended to any y € K™ by 

9/(y) = ^ E ^ iy,xj)(pi(xj). (6) 

A/ 

This definition ensures that the out-of-sample extension is 
consistent when restricted to M, meaning (pi{y) = (Pi{y) for 
y €M. Combining Equations (5) and (6) allows / to be as¬ 
signed for any y € ffi.'” by evaluating the eigenfunctions (pi at 
>’ and using the projection of / onto these eigenfunctions as 
weights: 

f{y) = Y.{<PiJ)9j{y)- ( 7 ) 

j=i 

Equation (7) is called the Nystrom extension, and in Coif- 
man and Lafon (2006b) the extended eigenfunctions in Equa¬ 
tion (6) are called geometric harmonics. We note this scheme 
becomes ill conditioned since A/ —0 as 1 —5- oo (Coifman 
and Lafon, 2006b), so in practice there is a truncation of 
the sum at some level I, usually determined by the decay of 
the eigenvalues. With the interpretation of the eigenvalues X 
as wavenumbers on the underlying data manifold, this trun¬ 
cation represents removing features from the data that are 
highly oscillatory in this space. 

2.2.2 Laplacian pyramid 

The geometric harmonics method is well suited for observ¬ 
ables that have a tight bandwidth in the eigenfunction ba¬ 
sis (particularly the eigenfunctions themselves), but for ob¬ 
servables that may require high levels of eigenfunctions in 
their representation, the above mentioned ill-conditioning 
may hamper this method. An alternative to geometric har¬ 
monics is the Laplacian pyramid (Rabin and Coifman, 2012), 
which invokes a multiscale decomposition of the original 
function / in its out-of-sample extension approach. 

A family of kernels defined at different scales is needed, 
and for clarity of exposition we will use a family of Gaus¬ 
sian kernels w/ (and their row-normalized counterparts Wi) 
at scales 1: 

wi{xi,Xj)=e ( 8 ) 

That is, 1 = 0 represents the widest kernel width, and in¬ 
creasing I gives finer kernels resolving more localized struc¬ 
tures. 

Eor a function f : M ^ M., the Laplacian pyramid rep¬ 
resentation of / approximates / in a multiscale manner by 
/ w So + ■51 + *2 H- 5 where the first level sq is defined by 

n 

soi^k) = Y.^o{xi,Xk)f{xi), 

i=l 


and we then evaluate the difference: 
di =/-so. 

We then iteratively define the 1th level decomposition sp 

n l—\ 

^li^k) = Y^Wi{xi,Xk)di{xi), di = 

i= 1 /=0 

Iteration is continued until some prescribed error tolerance 
Wf-Lk^kW <eis met. 

Next we extend / to a new point y € K.'”,y ^ D by 
■5o(y) = 

/= 1 /= 1 

for I > 1, and assign / the value 

f{y)=Y.^k{y)- ( 9 ) 

k 

That is, we have formed a multiscale representation of / 
using weighted averages of / for nearby inputs, where the 
weights are given by the scale of the kernel function. Since 
the kernel function can accept any inputs from M"*, we can 
define these weights for other points outside M, and thus de¬ 
fine / by using weighted values of / on M (known), where 
now the weights are given by the proximity of the out-of- 
sample y ^ M to input points x, S M. The parameter choices 
of the initial scale Oq and error tolerance e set the scale and 
cut-off of the dyadic decomposition of / in the Laplacian 
pyramid scheme. We choose Ob to be the median of the pair¬ 
wise distances of our training data, and the error tolerance e 
to be scaled by the norm of the observable over the training 
data, for example 10^^||/||. In our applications below, we 
use a multiscale family of NLS A kernels based on Equation 
(4) rather than the above family of Gaussian kernels. 

2.3 Kernel ensemble analog forecasting 

The core idea of traditional analog forecasting is to iden¬ 
tify a suitable analog to one’s current initial state from a 
historical record, and then make a prediction based on the 
trajectory of that analog in the historical record. The analog 
forecasting approach laid out in Zhao and Giannakis (2014), 
which we use here, varies in a few important regards. Lirst, 
the initial system state, as well as the historical record (train¬ 
ing data), is in Takens embedding space, so that an analog is 
not determined by a snapshot in time alone, but the current 
state with some history (a ‘video’). Second, rather than us¬ 
ing Euclidean distance, as in traditional analog forecasting, 
the distances we use are based on a defined kernel function, 
which reflects a non-Euclidean geometry on the underlying 
data manifold. The choice of geometry is influenced by the 
lagged embedding and choice of kernel, which we have done 
in a way that gives us time scale separation in the resulting 




6 


Darin Comeau et al. 


eigenfunctions (j), yielding high predictability. Third, rather 
than identify and use a single analog in the historical record, 
weighted sums of analogs are used, with weights determined 
by the kernel function. 

For this last point, it is useful to view analog forecasting 
in the context of an expectation over an empirical probability 
distribution. For example, a traditional analog forecast of a 
new initial state y at some time T in the future, based on 
selected analog xj = x{tj), can be written as 

f{y, t) = = Y, Pyixi)f {x{ti + t)) = / {x{tj + t)) , 

1=1 

where py = is the Dirac delta function, and Srfixi) = 
f{x{ti + t)) is the operator that shifts the timestamp of x, 
by T. To make use of more than one analog and move to 
an ensemble, we let py be a more general discrete empir¬ 
ical distribution, dependent on the initial condition y, with 
probabilities (weights) determined by our kernel function. 
Writing in this way, we simply need to define the empiri¬ 
cal probability distribution py to form our ensemble analog 
forecast. 

For the geometric harmonics method, we projected / 
onto an eigenfunction basis (truncated at some level 1) 
and performed an out-of-sample extension on each eigen¬ 
function basis function, i.e. 

/=1 

Or, to write this in terms of an expectation over an empirical 
probability measure: 

/ 

f{y) = Ep,/ = ^(0;,/)Ep^0i(y), 

/=1 

where 

Ep,^i = Y E 

We then define our prediction for lead time t via geometric 
harmonics by 

i 

1=1 

where 

EpySrtjfi = Y Y'^(y^Xj)(j)i{x{tj -f t)). 
i=l 

Similarly, the T shifted ensemble analog forecast via Lapla- 
cian pyramids is then 

i 

= '^Py,o^T:f + E ^Py,i^T:‘^h 

1=1 


where Pyj{x) = Wi{y,x) corresponds to the probability dis¬ 
tribution from the kernel at scale i. 

Thus we have a method for forming a weighted ensem¬ 
ble of predictions, that is non-parametric and data-driven 
through the use of a historical record (training data), which 
itself has been subject to analysis that reflects the dynamics 
of the high-dimensional system in the non-linear geometry 
on the underlying abstract data manifold and produces a 
natural preferred class of observables to target for prediction 
through the kernel eigenfunctions. 

2.4 Autoregressive modeling 

We wish to compare our ensemble analog prediction meth¬ 
ods to more traditional parametric methods, namely autore¬ 
gressive models of the North Pacific variability (Frankignoul 
and Hasselmann, 1977), of the form 

x{t + l) = p{t) +A{t)x{t) + a{t)e{t) 

where x{t) is our signal, /r(f) is the external forcing (pos¬ 
sibly 0), A(f) is the autoregressive term, and a(t) is the 
noise term, each to be estimated from the training data, and 
e(f) is a Gaussian process. In the stationary case, the model 
coefficients — jj., A{t) = A, a{t) = a are constant in 
time, and can be evaluated in an optimal way through ordi¬ 
nary least squares. In a non-stationary case, we will invoke 
the FEM-VARX framework of Horenko (2010b) by cluster¬ 
ing the training data into K clusters, and evaluating coef¬ 
ficients Ap, Op, k= \ . ,K for each cluster (see Horenko 

(2010a,b) for more details on this algorithm). From the algo¬ 
rithm we obtain a cluster identification function F(t), such 
that r{ti) = k indicates at time f, the model coefficients are 
A(f,) = Ap, o(f,) = Op. In addition to choosing the number 
of clusters K, the method also has a persistence parameter 
C that governs the number of allowable switches between 
clusters that must be chosen prior to calculating model co¬ 
efficients. These parameters are usually chosen to be opti¬ 
mal in the sense of the Akaike Information Criterion (AIC) 
(Horenko, 2010b; Metzner et al, 2012), an information the¬ 
oretic based measure for model selection which penalizes 
overfitting by large number of parameters. 

As mentioned earlier, while non-stationary autoregres¬ 
sive models may perform better than stationary models in 
fitting, an inherent difficulty in a prediction setting is the ad¬ 
vancement of the model coefficients A{t),a{t) beyond the 
training period, which in the above framework amounts to 
solely advancing the cluster affiliation function F(f). If we 
call 7rp(f) the probability of the model being at cluster k at 
time f, we can view F(t) as determining a Markov switching 
process on the cluster member probabilities 


n{t) = {ni{t),...,nK{t)), 
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which over the training period will be 1 in one entry, and 0 
elsewhere at any given time. We can estimate the transition 
probability matrix T of that Markov process by using the op¬ 
timal cluster affiliation sequence F(f) from the FEM-VARX 
framework (here optimal is for the training period). Assum¬ 
ing the Markov hypothesis, we can estimate the stationary 
probability transition matrix directly from F (f) by: 


Tij = 


Ni, 


T,k=l^ik 


where Nij is the number of direct transitions from state i 
to state j (Horenko, 2011a; Franzke et al, 2009). This esti¬ 
mated transition probability matrix T can be used to model 
the Markov switching process in the following ways. 


= AE( 3'(0 + T)-KT))^ 

n 

= A E (^(0 +'^) 

n 

An important note is that for data-driven observables, such 
as NLS A eigenfunctions or EOF principal components, there 
is no underlying ground truth when predicting into the fu¬ 
ture. As such a ground truth for comparison needs to be de¬ 
fined when evaluating the error metrics, for which one can 
use the out-of-sample extended function f{y{tj + t)) as de¬ 
fined in Equations (7) and (9). 

3 Datasets 


2.4.1 Generating predictions of cluster affiliation 


3.1 CCSM model output 


The first method we employ is to advance the cluster mem¬ 
ber probabilities n{t) using the estimated probability transi¬ 
tion matrix T by the deterministic equation (Franzke et al, 
2009): 

nfo + i) = nffjT'' 

where nfo) = {7l\ (to), ^ 2 (^ 0 )) is the initial cluster affiliation, 
which is determined by which cluster center the initial point 
x(to) is closest to, and Trfto) is either 0 or 1. 

The second method we employ is to use the estimated 
transition matrix T to generate a realization of the Markov 
switching process Tr, and use this to determine the model 
cluster at any given time, maintaining strict model affilia¬ 
tion. Thus nk{t) = 1 is r^f) = k, and 0 otherwise. 


2.5 Error metrics 


To gauge the fidelity of our predictions, we will evaluate the 
average root-mean-square error (RMSE) and pattern corre¬ 
lation (PC) of our predictions y against the ground truth x, 
where points in our test data set (of length n') are used as in- 
tial conditions for our predictions. As a benchmark, we will 
compare each prediction approach to a simple persistence 
forecast y(T) = y(0). The error metrics are calculated as 


1 ” 

rms^(x) = - ^ (y(tj + t) -x(tj + x)f , 

" ;=i 

= 7 S -577KW-' 


.7=1 


where 


y(T:) = A E = A E 


" 7=1 


" 7=1 


We use model data from the Community Climate System 
Model (CCSM), versions 3 and 4, for monthly SIC and SST 
data, restricted to the North Pacific, which we define as 20°- 
65°N, 120°E-110°W. CCSM3 model data is used from a 
900 year control run (experiment b30.004) (Collins et al, 
2006). The sea ice component is the Community Sea Ice 
Model (CSIM; Holland et al, 2006) and the ocean compo¬ 
nent is the Parallel Ocean Program (POP), both of which are 
sampled on the same nominal 1° grid. CCSM4 model data 
is used from a 900 year control run (experiment 640.1850), 
which uses the Community Ice CodE 4 model for sea ice 
(CICE4; Hunke and Lipscomb, 2008) and the POP2 model 
for the ocean component (Smith et al, 2010), also on a com¬ 
mon nominal 1° grid. Specific differences and improvements 
between the two model versions can be found in Gent et al 
( 2011 ). 

NLSA was performed on these data sets, both in sin¬ 
gle and multiple component settings, with the same embed¬ 
ding window of qi = q 2 = q = 24 months for each variable, 
kernel scale £ = 2, and kernel normalization a = 0. A 24 
month embedding window was chosen to allow for dynam¬ 
ical memory beyond the seasonal cycle, as is used in other 
NLSA studies (e.g., Giannakis and Majda (2012b, 2013); 
Bushuk et al (2014)). There are 6648 grid points for SST and 
3743 for SIC in this region for these models, and with an em¬ 
bedding window of ^ = 24, this means our lagged embedded 
data lies in R"* with m = 159,552 for SST and m = 89,832 
for SIC. For the purposes of a perfect model experiment, 
where the same model run is used for both the training and 
test data, we split the CCSM4 control run into two 400 year 
sets; years 100 - 499 for the training data set, and years 500 
- 899 for out-of-sample test points. After embedding, this 
leaves us with n = n' = 4777 samples in each data set. In 
our model error experiment, we train on 800 years of the 
CCSM3 control run, and use 800 years of CCSM4 for test 
data, giving us n = n' = 9577 data points. 
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In addition to low-frequency kernel eigenfunctions as 
observables, we also consider North Pacific integrated sea 
ice extent anomalies as a target for prediction in Section 4.2. 
This observable exhibits faster variability, on an intra-annual 
time-scale, and as such we reduce the embedding window 
from 24 months to 6 months for our kernel evaluation. Using 
the CCSM4 model training data, we define monthly anoma¬ 
lies by calculating a climatology fc of monthly mean sea ice 
extent. Let Vj be gridpoints in our domain of interest (North 
Pacific), c the mean SIC, a the grid cell area, and then the 
sea ice extent anomaly hat will be our target observable is 
defined as 

fiti) = Y,c{vj,ti)a{vj)-fc{ti). (10) 

j 


3.2 Observational data 

For observational data, we turn to the Met Office Hadley 
Centre’s HadlSST data set (Rayner et al, 2003), and use 
monthly data for SIC and SST, from years 1979-2012, sam¬ 
pled on a 1° latitude-longitude grid. We assign ice covered 
grid points an SST value of — 1.8°C, and have removed a 
trend from the data by calculating a linear trend for each 
month. There are 4161 spatial grid points for SST, for a 
lagged embedded dimension of m = 99,864, and 3919 grid 
points for SIC, yielding m — 94,056. For direct comparison 
with this observational data set, the above CCSM model data 
sets have been interpolated from the native POP grid 1 ° grid 
to a common 1° latitude-longitude grid. After embedding, 
we are left with n' = 381 observation test data points. 


(out-of-sample) data. We use the notation , or 0®, 

to indicate if the NLSA mode is from SIC, SST, or joint 
SST and SIC variables, respectively. 



Fig. 1 Select NLSA eigenfunctions for CCSM 4 North Pacific SST and 
SIC data, pi, p3 are periodic (annual and semi-annual) modes, charac¬ 
terized by a peak in the power spectrum and oscillatory autocorrelation 
function; p9, pi4 are the two leading low-frequency modes, character¬ 
ized by a red power spectrum and slowly decaying montone autocor¬ 
relation function; pio, pis are intermittent modes, characterized by a 
broad peak in the power spectrum and decaying oscillatory autocorre¬ 
lation function. 


4 Results 

4.1 Low-frequency NLSA modes 

The eigenfunctions that arise from NLSA typically fall into 
one of three categories: i) periodic modes, which capture the 
seasonal cycle and its higher harmonics; ii) low-frequency 
modes, characterized by a red power spectrum and a slowly 
decaying autocorrelation function; and iii) intermittent modes, 
which have the structure of periodic modes modulated with a 
low-frequency envelope. The intermittent modes are dynam¬ 
ically important (Giannakis and Majda, 2012a), shifting be¬ 
tween periods of high activity and quiessence, but carry lit¬ 
tle variance, and are thus typically missed or mixed between 
modes in classical SSA. Examples of each of these eigen¬ 
functions arising from a CCSM4 data set with SIC and SST 
variables are shown in Figure 1, for years 100^99 of the 
pre-industrial control run. The corresponding out-of-sample 
extension eigenfunctions, defined through the Nystrom method, 
are shown in Figure 2, and are computed using years 500- 
899 of the same CCSM4 pre-industrial control run as test 


We perform our prediction schemes for five year time 
leads by applying the kernel ensemble analog forecast meth¬ 
ods discussed in Section 2.3 to the leading two low-frequency 
modes p®, p® from NLSA on North Pacific, shown in Fig¬ 
ure 1. The leading low-frequency modes extracted through 
NLSA can be though of as analogs to the well known PDO 
and NPGO modes, even in the multivariate setting. We have 
high correlations between the leading multivariate and uni¬ 
variate low-frequency NLSA modes, with corr ) = 

—0.9907 for our analog of the NPGO mode, and corr ) 

0.8415 for our analog of the PDO mode. 

As a benchmark, we compare against the simple con¬ 
stant persistence forecast y(t) = y(0), which can perform 
reasonably well given the long decorrelation time of these 
low-frequency modes, and in fact beats paramteric autore¬ 
gressive models as we will see below. We define the ground 
truth itself to be the out-of-sample eigenfunction calculated 
by Equation (7), so by construction, our predictions by the 
geometric harmonics method are exact at time lag T = 0, 
whereas predictions using Laplacian pyramids will have re¬ 
construction errors at time lag T = 0. 
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Snapshot of Time Series Autocorrelation Function Power Spectral Density 



Fig. 2 Out-of-sample extension eigenfunctions, computed via Equa¬ 
tion (7), for CCSM North Pacific SST and SIC data, associated with 
the same select (in-sample) eigenfunctions shown in Figure 1. 


4.1.1 Perfect model 

We first consider the perfect model setting, where the same 
dynamics generate the training data and test (forecast) data. 
This should give us a measure of the potential predictability 
of the methods. Snapshots of sample prediction trajectories 
along with the associated ground truth out-of-sample eigen¬ 
function are shown in Figure 3. In Figure 4, we see that for 
the leading low-frequency mode 0® (our NPGO analog), 
the ensemble based predictions perform only marginally bet¬ 
ter than persistence in the PC metric, but have improved 
RMSE scores over longer timescales. However with the sec¬ 
ond mode € (our PDO analog), we see a more noticeable 
gain in predictive skill with the ensemble analog methods 
over persistence. If we take 0.6 as a PC threshold (Collins, 
2002 ), below which we no longer consider the model to have 
predictive skill, we see an increase of about 8 months in pre¬ 
dictive skill with the ensemble analog methods over persis¬ 
tence, with skillful forecasts up to 20 months lead time. We 
note these low-frequency modes extracted from multivariate 
data exhibit similiar predictability (as measured by when the 
PC falls below the 0.6 threshold) than their univariate coun¬ 
terparts (0fj or results not shown). 

4.1.2 Model error 

To incorporate model error into our prediction experiment, 
we train on the CCSM3 model data, serving as our ‘model’, 
and then use CCSM4 model data as our test data, serving the 
role of our ‘nature’, and the difference in the fidelity of the 



Fig. 3 Sample snapshot trajectories of ensemble analog prediction re¬ 
sults via geometric harmonics (GH, blue, solid) and Laplacian pyramid 
(LP, blue, dashed), for the leading two low-frequency modes, in perfect 
model setting using CCSM4 data. The ground truth (T, black) is itself 
an out-of-sample extension eigenfunction, as shown in Figure 2. 


RMSE for NPGO predictions RMSE for PDO predictions 



Fig. 4 Kernel ensemble analog prediction results via geometric har¬ 
monics and Faplacian pyramid for the leading two low-frequency 
modes, in perfect model setting using CCSM4 data. Both out-of- 
sample extension methods outperform the persistence forecast (P) in 
both error metrics, particularly for in the (PDO) mode. 


two model versions represents general model error. In this 
experiment, our ground truth is an out-of-sample eigenfunc¬ 
tion trained on CCSM3 data, and extended using CCSM4 
test data. For the leading 0® (NPGO) mode, we see again 
marginal increased predictive performance in the ensemble 
analog predictions over persistence at short time scales in 
PC in Figure 5, but at medium to long time scales this im¬ 
provement has been lost (though after the score has fallen 
below the 0.6 threshold). This could be due to the increased 
fidelity of the CCSM4 sea ice model component over the 
CCSM3 counterpart, where using the less sophisticated model 
data for training leaves us a bit handicapped in trying to pre¬ 
dict the more sophisticated model data (Gent et al, 2011; 
Holland et al, 2012). The improvement in the predictive skill 
of (PDO) mode over persistence is less pronounced in 
the presence of model error than it was in the perfect model 
case shown in Figure 4. Nevertheless, the kernel ensemble 
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analog forecasts still provide a substantial improvement of 
skill compared to the persistence forecast, extending the PC 
= 0.6 threshold to 20 months. 


RMSE for NPGO predictions RMSE for PDO predictions 



Fig. 5 Kernel ensemble analog forecasting prediction results for the 
leading two low-frequency modes, with model error. CCSM3 model 
data is used for the in-sample data, and CCSM4 model data is used as 
the out-of-sample data. While the predictions still outperform persis¬ 
tence in the error metrics, there is less gain in predictive skill over as 
compared to the perfect model case. 


We can further examine the model error scenario by us¬ 
ing the actual obervational data set as our nature, and CCSM4 
as our model (Figure 6). Given the short observational record 
used, far fewer prediction realizations are generated, adding 
to the noisiness of the error metrics. The loss of predictabil¬ 
ity is apparent, especially in the € mode, where the kernel 
ensemble analog forecasts fail to beat persistence, and drop 
below 0.6 in PC by 10 months, half as long as the perfect 
model case. 

4.1.3 Comparison with Autoregressive Models 

We compare the ensemble analog predictions to standard 
stationary autoregressive models, as well as non-stationary 
models using the FEM-VARX framework of Horenko (2010b) 
discussed in Section 2.4, for the low-frequency modes 

generated from CCSM4 model (Figure 7) and the HadlSST 
observation (8) training data. For both sets of low-frequency 
modes, K — 2 clusters was judged to be optimal by the AlC 
as mentioned in Section 2.4-see Horenko (2010b); Metzner 
et al (2012) for more details. The coefficients for each clus¬ 
ter are nearly the same (autoregressive coefficent close to 1, 
similar noise coefficients), apart from the constant forcing 
coefficient /r, of almost equal magnitude and opposite sign, 
suggesting two distinct regime behaviors. In the stationary 
case, the external forcing coefficient p is very close to 0. 


RMSE for NPGO predictions 



Fig. 6 Ensemble analog prediction results for the leading two low- 
frequency modes, with model error. CCSM4 model data is used for the 
in-sample data, and HadlSST observational data is used as the out-of- 
sample data. Here the method produces little advantage over persis¬ 
tence, given the model error between model and nature, and actually 
fails to beat persistence for the € (PDO) mode. 


In the top left panel of Figures 7 and 8, we display snap¬ 
shots of the leading low-frequency mode (NPGO) tra¬ 
jectory reconstruction during the training period, for the sta¬ 
tionary (blue, solid), and non-stationary (red, dashed) mod¬ 
els, along with the cluster switching function associated with 
the non-stationary model. In both the CCSM4 model and 
HadlSST data sets, the non-stationary model snapshot is a 
better representation of the truth (black, solid), and the ben¬ 
efit over the stationary model is more clearly seen in the 
CCSM4 model data. Figure 7, which has the benefit of a 
400 year training period, as opposed to the shorter 16 year 
training period with the observational data set. 

In the prediction setting, however, the non-stationary mod¬ 
els, which are reliant on an advancement of the unknown 
cluster affiliation function F(f) beyond the training period, 
as discussed in Section 2.4, fail to outperform their station¬ 
ary counterparts in the RMSE and PC metrics (bottom pan¬ 
els of Eigures 7 and 8). In fact, none of the proposed re¬ 
gression prediction models are able to outperform the sim¬ 
ple persistence forecast in these experiments. As a measure 
of potential predition skill for the non-stationary models, 
whereby we mean that if perfect knowledge of the underly¬ 
ing optimal cluster switching function F(f) could be known 
over the test period, we have run the experiment of replacing 
the test data period with the training data set, and find ex¬ 
ceedingly strong predictive performance, with PC between 
0.7 and 0.8 for all time lags tested, up to 60 months. Simi¬ 
lar qualitative results for the second leading low-frequency 
mode (PDO) for each data set were found (not shown). 
This suggests that the Markov hypothesis, the basis for the 
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predictions of F(f), is not accurate, and other methods in¬ 
corporating more memory are needed. 


Sample Trajectories 




Fig. 7 Top left: Snapshot of the true CCSM4 (NPGO) trajectory 
(black) with reconstructed stationary (blue) and non-stationary K = 2 
(red) FEM-VARX model trajectories, along with corresponding model 
affiliation function F (t) for non-stationary case. Top right: Sample tra¬ 
jectories for various prediction methods: P2 = stationary, using FEM- 
VARX model coefficients from initial cluster; K1 = stationary autore¬ 
gressive; M1, M2 = FEM-VARX with predictions as described in Sec¬ 
tion 2.4.1, where Mils detenninistic evolution of the cluster affiliation 
Tt{l), and M2 uses realizations of K generated from the estimated prob¬ 
ability transition matrix T. Bottom panels: RMSE and PC as a function 
of lead time for various prediction methods, including PI = persistence 
as a benchmark. The dashed black line is for potential predictive skill of 
non-stationary FEM-VARX, where predictions were ran over the train¬ 
ing period using the known optimal model affiliation function F(f). 


4.2 Sea ice anomalies 

The targeted observables hereto considered for prediction 
have been data driven, and as such influenced by the data 
analysis algorithm. Hence there is no objective ground truth 
available when predicting these modes beyond the training 
period on which the data analysis was performed, and while 
in this case the NLSA algorithm was used, other data analy¬ 
sis methods such as EOFs would suffer the same drawback. 
We wish to test our prediction method on an observable that 
is objective, in the sense that it can be computed indepen¬ 
dently of the data analysis algorithm, for which we turn 
to integrated sea ice extent anomalies, as dehned in Equa¬ 
tion (10). We can clearly compute the time series of sea ice 
anomalies from the out-of-sample set directly (relative to the 
training set climatology), which will be our ground truth, 
and use the Laplacian pyramid approach to generate our 
out-of-sample extension predictions. This observable does 


Sample Trajectories 



Fig. 8 Top left: Snapshot of the true HadlSST (ji® (NPGO) trajectory 
(black) with stationary (blue) and non-stationary K — 2 (red) FEM- 
VARX model trajectories, along with the corresponding model affilia¬ 
tion function F(f) for non-stationary case. Top right: Sample trajecto¬ 
ries for various prediction methods-see Figure 7 for details of methods. 
Bottom panels: RMSE and PC as a function of lead time for various 
prediction methods. The dashed black line is for potential predictive 
skill of non-stationary FEM-VARX, where predictions were ran over 
the training period using the known optimal model affiliation function 

r(t). 


not have a tight expansion in the eigenfunction basis, so 
the geometric harmonics method of extension will be ill- 
conditioned, and thus not considered. We note that in this 
approach, there are reconstruction errors at time lag T = 0, so 
at very short time scales we cannot outperform persistence. 
We consider a range of truncation levels for the number of 
ensemble members used, which are nearest neighbors to the 
out-of-sample data point, as determined by the kernel func¬ 
tion. Using all available neighbors will likely overly smooth 
and average out features in forward trajectories, while us¬ 
ing too few neighbors will place to much weight on particu¬ 
lar trajectories. Indeed we hnd good performance using 100 
(out of total possible 4791). The top left panel of Figure 9 
shows a snapshot of the true sea ice extent anomalies, re¬ 
spectively, together with a reconstructed out-of-sample ex¬ 
tension using the Laplacian pyramid. To be clear this is not 
a prediction trajectory, but rather each point in the out-of- 
sample extension is calculated using Equation (9); that is, 
each point is a time-lead T = 0 reconstruction. 

The top right panel shows sample snapshots of predic¬ 
tion trajectories, restricting the ensemble size to the nearest 
10, 100, and then all nearest neighbors. Notice in particular 
predictions match the truth when the anomalies are close to 
0 , but then may subsequently progress in the opposite sign 
as the truth. As our prediction metrics are averaged over ini¬ 
tial conditions spanning all months, the difficulty the pre- 
























































































12 


Darin Comeau et al. 


3 


0 



-3 


-6 




time (months) 


PC 



Fig. 9 Top left: True sea ice cover anomalies plotted with the cor¬ 
responding Laplacian pyramid out-of-sample extention function. The 
other panels show prediction results using Laplacian pyramids for total 
North Pacific sea ice cover anomalies in CCSM4 data. The number of 
nearest neighbors (nN) used to form the ensemble was varied, and we 
find the best performance when the ensemble is restricted to the nearest 
100 neighbors, corresponding to about 2% of the total sample size. 


dictions have in projecting from a state of near 0 anomaly 
significantly hampers the ability for long-range predictabil¬ 
ity of this observable. 

In the bottom two panels of Figure 9 we have the aver¬ 
aged error metrics, and see year to year correlations mani¬ 
festing as a dip/bump in RMSE and PC in the persistence 
forecast that occurs after 12 months. After the first month 
lag time, the kernel ensemble analog forecasts overcome the 
reconstruction error and beat persistence in both RMSE and 
PC, and give about a 2 month increase in prediction skill (as 
measured by when the PC drops below 0.6) over persistence. 
We see the best performance restricting the ensemble size to 
100 nearest neighbors (about 2% of the total sample size) 
in both the RMSE and PC metrics, though this is marginal 
before the error metrics drop below the 0.6 threshold. 

Pushing the prediction strategy to an even more difficult 
problem, in Eigure 10 we try to predict observational sea ice 
extent anomalies using CCSM4 model data as training data. 
In this scenario, without knowledge of the test data clima¬ 
tology, the observation sea ice extent anomalies are defined 
using the CCSM4 climatology. In the top panel of Eigure 10 
we see the strong bias as a result, where the observational 
record has less sea ice than the CCSM model climatology, 
which has been taken from a pre-industrial control run. This 
strongly hampers the ability to accurately predict observa¬ 
tional sea ice extent anomalies using CCSM4 model ensem¬ 
ble analogs, and as a result the only predictive skill we see 
is from the annual cycle. 



Fig. 10 Prediction results for total observed (HadlSST data) North Pa¬ 
cific sea ice volume anomalies, using CCSM4 as training data. The ice 
cover function is out-of-sample extended via Laplacian pyramid, using 
100 nearest neighbors. 


5 Discussion 

We have examined a recently proposed prediction strategy 
employing a kernel ensemble analog forecasting scheme mak¬ 
ing use of out-of-sample extension techniques. These non- 
parametric, data-driven methods make no assumptions on 
the underlying governing dynamics or statistics. We have 
used these methods in conjunction with NLSA to extract 
low-frequency modes of variability from North Pacific SST 
and SIC data sets, both from models and observations. We 
find that for these low-frequency modes, the analog fore¬ 
casting performs at least as well, and in many cases bet¬ 
ter than, the simple constant persistence forecast. Predictive 
skill, as measured by PC exceeding 0.6, can be increased by 
up to 3 to 6 months for low-frequency modes of variability 
in the North Pacific. This is a strong advantage over tradi¬ 
tional parametric regression models, which were shown to 
fail to beat persistence. 

The kernel ensemble analog forecasting methods out¬ 
lined included two variations on the underlying out-of-sample 
extension scheme, each with its strengths and weaknesses. 
The geometric harmonics method, based on the Nystrom 
method, worked well for observables that are band-limited 
in the eigenfunction basis, in particular the eigenfunctions 
themselves. However for observables not easily expressed in 
such a basis, the Laplacian pyramid provides an alternative 
method based on a multiscale decomposition of the original 
observable. 

While the low-frequency eigenfunctions from NLSA were 
a natural preferred class of observables to target for predic¬ 
tion, we also studied the case of objective observables un¬ 
influenced by the data analysis algorithm. Motivated by the 
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Strong reemergence phenomena, we considered sea ice ex¬ 
tent anomalies as our target for prediction in the North Pa¬ 
cific. Using a shorter embedding window due to faster (sea¬ 
sonal) time scale dynamics, we obtain approximately a two 
month increase in predictive skill over the persistence fore¬ 
cast. It is also evident that when considering regional sea 
ice extent anomalies winds play a large role in moving ice 
into and out of the domain of interest, and as such additional 
consideration of the atmospheric component in the system 
could be included in the multivariate kernel function, despite 
having weaker low-frequency variability. 

An important consideration is that our prediction metrics 
are averaged over initial conditions ranging over all possible 
initial states of the system. As we saw clearly in the case 
of North Pacific sea ice volume anomalies, these prediction 
strategies can have difficulty with projecting from an initial 
state of quiessence, and can easily predict to the wrong sign 
of an active state, greatly hampering predictive skill. On the 
other hand we would expect predictive skill to be stronger 
for those initial states that begin in a strongly active state, 
or said differently, clearly in one climate regime, as oppose 
to in transition between the two. Future work will further 
explore conditional forecasting, where we either condition 
forecasts on the initial month, or the target month. Also ex¬ 
tending this analysis to the North Atlantic, another region of 
strong low-frequency variability, is a natural progression of 
this work. 

Acknowledgements The research of Andrew Majda and Dimitrios 
Giannakis is partially supported by ONR MURI grant 25-74200-F7112. 
Darin Comeau is supported as a postdoctoral fellow through this grant. 
The research of Dimitrios Giannakis is also partially supported by 
ONR DRI grant N00014-14-0150. 


References 

Alessandrini S, Delle Monache L, Sperati S, Nissen J (2015) 
A novel application of an analog ensemble for short-term 
wind power forecasting. Renewable Energy 76:768-781 
Alexander MA, Deser C, Timlin MS (1999) The reemer¬ 
gence of SST anomalies in the North Pacific Ocean. Jour¬ 
nal of climate 12(8):2419-2433 
Berry T, Cressman JR, Greguri?-Feren?ekZ, Sauer T (2013) 
Time-scale separation from diffusion-mapped delay co¬ 
ordinates. SIAM Journal on Applied Dynamical Systems 
12(2):618-649 

Blanchard-Wrigglesworth E, Bitz CM (2014) Characteris¬ 
tics of Arctic sea-ice thickness variability in GCMs. Jour¬ 
nal of Climate 27(21):8244-8258 
Blanchard-Wrigglesworth E, Armour KC, Bitz CM, 
DeWeaverE (201 la) Persistence and inherent predictabil¬ 
ity of Arctic sea ice in a GCM ensemble and observations. 
Journal of Climate 24(1):231-250 


Blanchard-Wrigglesworth E, Bitz C, Holland M (2011b) 
Influence of initial conditions and climate forcing on 
predicting Arctic sea ice. Geophysical Research Letters 
38(18) 

Branstator G, Teng H, Meehl GA, Kimoto M, Knight JR, 
Latif M, Rosati A (2012) Systematic estimates of initial- 
value decadal predictability for six AOGCMs. Journal of 
Climate 25(6): 1827-1846 

Bushuk M, Giannakis D, Majda AJ (2014) Reemergence 
mechanisms for North Pacific sea ice revealed through 
nonlinear Laplacian spectral analysis*. Journal of Climate 
27(16):6265-6287 

Chen N, Majda AJ (2015a) Predicting the cloud patterns 
for the boreal summer intraseasonal oscillation through a 
low-order stochastic model. Mathematics of Climate and 
Weather Forecasting 

Chen N, Majda AJ (2015b) Predicting the real-time mul¬ 
tivariate Madden-Julian oscillation index through a low- 
order nonlinear stochastic model. Monthly Weather Re¬ 
view (2015) 

Chen N, Majda A, Giannakis D (2014) Predicting the cloud 
patterns of the Madden-Julian oscillation through a low- 
order nonlinear stochastic model. Geophysical Research 
Letters 41(15):5612-5619 

Coifman RR, Lafon S (2006a) Diffusion maps. Applied and 
computational harmonic analysis 21(l):5-30 

Coifman RR, Lafon S (2006b) Geometric harmonics: a 
novel tool for multiscale out-of-sample extension of em¬ 
pirical functions. Applied and Computational Harmonic 
Analysis 21(l):31-52 

Collins M (2002) Climate predictability on interannual to 
decadal time scales: the initial value problem. Climate 
Dynamics 19(8):671-692 

Collins WD, Bitz CM, Blackmon ML, Bonan GB, Brether- 
ton CS, Carton JA, Chang P, Doney SC, Hack JJ, Hen¬ 
derson TB, et al (2006) The community climate sys¬ 
tem model version 3 (CCSM3). Journal of Climate 
19(11):2122-2143 

Day J, Tietsche S, Hawkins E (2014) Pan-Arctic and re¬ 
gional sea ice predictability: Initialization month depen¬ 
dence. Journal of Climate 27(12):4371^390 

Di Lorenzo E, Schneider N, Cobb K, Franks P, Chhak 
K, Miller A, McWilliams J, Bograd S, Arango H, Cur- 
chitser E, et al (2008) North Pacific Gyre Oscillation links 
ocean climate and ecosystem change. Geophysical Re¬ 
search Letters 35(8) 

Drosdowsky W (1994) Analog (nonlinear) forecasts of the 
Southern Oscillation index time series. Weather and fore¬ 
casting 9(l):78-84 

Frankignoul C, Hasselmann K (1977) Stochastic climate 
models, part ii application to sea-surface temperature 
anomalies and thermocline variability. Tellus 29(4):289- 
305 




14 


Darin Comeau et al. 


Franzke C, Crommelin D, Fischer A, Majda AJ (2008) 
A hidden Markov model perspective on regimes and 
metastability in atmospheric flows. Journal of Climate 
21(8):1740-1757 

Franzke C, Horenko I, Majda AJ, Klein R (2009) Sys¬ 
tematic metastable atmospheric regime identification 
in an AGCM. Journal of the Atmospheric Sciences 
66(7): 1997-2012 

Gent PR, Danabasoglu G, Conner LJ, Holland MM, Hunke 
EC, Jayne SR, Lawrence DM, Neale RB, Rasch PJ, 
Vertenstein M, et al (2011) The community climate sys¬ 
tem model version 4. Journal of Climate 24(19):4973- 
4991 

Ghil M, Allen M, Dettinger M, Ide K, Kondrashov D, Mann 
M, Robertson AW, Saunders A, Tian Y, Varadi F, et al 
(2002) Advanced spectral methods for climatic time se¬ 
ries. Reviews of geophysics 40(1):3-1 
Giannakis D, Majda AJ (2012a) Comparing low-frequency 
and intermittent variability in comprehensive climate 
models through nonlinear Laplacian spectral analysis. 
Geophysical Research Letters 39(10) 

Giannakis D, Majda AJ (2012b) Limits of predictability 
in the North Pacific sector of a comprehensive climate 
model. Geophysical Research Letters 39(24) 

Giannakis D, Majda AJ (2012c) Nonlinear Laplacian spec¬ 
tral analysis for time series with intermittency and 
low-frequency variability. Proceedings of the National 
Academy of Sciences 109(7):2222-2227 
Giannakis D, Majda AJ (2013) Nonlinear Laplacian spectral 
analysis: capturing intermittent and low-frequency spa- 
tiotemporal patterns in high-dimensional data. Statistical 
Analysis and Data Mining 6(3):180-194 
Giannakis D, Majda AJ (2014) Data-driven methods for dy¬ 
namical systems: Quantifying predictability and extract¬ 
ing spatiotemporal patterns. Mathematical and Computa¬ 
tional Modeling: With Applications in Engineering and 
the Natural and Social Sciences p 288 
Hasselmann K (1976) Stochastic climate models part i. The¬ 
ory. Tellus 28(6):473^85 

Holland MM, Bitz CM, Hunke EC, Lipscomb WH, 
Schramm JL (2006) Influence of the sea ice thickness dis¬ 
tribution on polar climate in CCSM3. Journal of Climate 
19(11):2398-2414 

Holland MM, Bailey DA, Briegleb BP, Light B, Hunke E 
(2012) Improved sea ice shortwave radiation physics in 
ccsm4: the impact of melt ponds and aerosols on arctic 
sea ice*. Journal of Climate 25(5): 1413-1430 
Horenko I (2010a) On clustering of non-stationary mete¬ 
orological time series. Dynamics of Atmospheres and 
Oceans 49(2): 164-187 

Horenko I (2010b) On the identification of nonstation¬ 
ary factor models and their application to atmospheric 
data analysis. Journal of the Atmospheric Sciences 


67(5): 1559-1574 

Horenko I (2011a) Nonstationarity in multifactor models 
of discrete jump processes, memory, and application to 
cloud modeling. Journal of the Atmospheric Sciences 
68(7): 1493-1506 

Horenko I (201 lb) On analysis of nonstationary categorical 
data time series: Dynamical dimension reduction, model 
selection, and applications to computational sociology. 
Multiscale Modeling & Simulation 9(4): 1700-1726 
Hunke E, Lipscomb W (2008) CICE: The Los Alamos sea 
ice model, documentation and software, version 4.0, Los 
Alamos National Laboratory tech. rep. Tech, rep., LA- 
CC-06-012 

Lorenz EN (1969) Atmospheric predictability as revealed 
by naturally occurring analogues. Journal of the Atmo¬ 
spheric sciences 26(4):636-646 
Majda AJ, Harlim J (2013) Physics constrained nonlinear 
regression models for time series. Nonlinearity 26(1):201 
Majda AJ, Yuan Y (2012) Lundamental limitations of ad 
hoc linear and quadratic multi-level regression models 
for physical systems. Discrete and Continuous Dynami¬ 
cal Systems B 17(4): 1333-1363 
Mantua NJ, Hare SR, Zhang Y, Wallace JM, Brands RC 
(1997) A Pacific interdecadal climate oscillation with im¬ 
pacts on salmon production. Bulletin of the american Me¬ 
teorological Society 78(6): 1069-1079 
Metzner P, Putzig L, Horenko I (2012) Analysis of persistent 
nonstationary time series and applications. Communica¬ 
tions in Applied Mathematics and Computational Science 
7(2): 175-229 

Newman M, Compo GP, Alexander MA (2003) ENSO- 
forced variability of the Pacific decadal oscillation. Jour¬ 
nal of Climate 16(23):3853-3857 
Nystrom EJ (1930) Uber die praktische auflosung von inte- 
gralgleichungen mit anwendungen auf randwertaufgaben. 
Acta Mathematica 54(1): 185-204 
Overland J, Rodionov S, Minobe S, Bond N (2008) North 
Pacific regime shifts: Definitions, issues and recent tran¬ 
sitions. Progress in Oceanography 77(2):92-102 
Overland JE, Percival DB, Mofjeld HO (2006) Regime 
shifts and red noise in the North Pacific. Deep Sea Re¬ 
search Part I: Oceanographic Research Papers 53(4):582- 
588 

Rabin N, Coifman RR (2012) Heterogeneous datasets rep¬ 
resentation and learning using diffusion maps and Lapla¬ 
cian pyramids. In: SDM, SIAM, pp 189-199 
Rayner N, Parker DE, Horton E, Lolland C, Alexander L, 
Rowell D, Kent E, Kaplan A (2003) Global analyses of 
sea surface temperature, sea ice, and night marine air tem¬ 
perature since the late nineteenth century. Journal of Geo¬ 
physical Research: Atmospheres (1984-2012) 108(D14) 
Sauer T, Yorke JA, Casdagli M (1991) Embedology. Journal 
of statistical Physics 65(3-4):579-616 




Data-driven prediction strategies for low-frequency patterns of North Pacific climate variability 


15 


Smith R, Jones P, Briegleb B, Bryan F, Danabasoglu G, Den¬ 
nis J, Dukowicz J, Eden C, Fox-Kemper B, Gent P, et al 
(2010) The Parallel Ocean Program (POP) reference man¬ 
ual: Ocean component of the Community Climate Sys¬ 
tem Model (CCSM). Los Alamos National Laboratory, 
LAUR-10-01853 

Takens F (1981) Detecting strange attractors in turbulence. 
Springer 

Tietsche S, Day J, Guemas V, Hurlin W, Keeley S, Matei 
D, Msadek R, Collins M, Hawkins E (2014) Seasonal 
to interannual Arctic sea ice predictability in current 
global climate models. Geophysical Research Letters 
41(3): 1035-1043 

Xavier PK, Goswami BN (2007) An analog method for real¬ 
time forecasting of summer monsoon subseasonal vari¬ 
ability. Monthly Weather Review 135(12):4149^160 

Zhao Z, Giannakis D (2014) Analog forecasting 
with dynamics-adapted kernels. arXiv preprint 
arXiv: 14123831 




